Supervised Entity Tagger for Indonesian Labor Strike Tweets using Oversampling Technique and Low Resource Features
نویسندگان
چکیده
منابع مشابه
Sentiment Analysis for Low Resource Languages: A Study on Informal Indonesian Tweets
This paper describes our attempt to build a sentiment analysis system for Indonesian tweets. With this system, we can study and identify sentiments and opinions in a text or document computationally. We used four thousand manually labeled tweets collected in February and March 2016 to build the model. Because of the variety of content in tweets, we analyze tweets into eight groups in total, inc...
متن کاملCharacter Embeddings PoS Tagger vs HMM Tagger for Tweets
English. The paper describes our submissions to the task on PoS tagging for Italian Social Media Texts (PoSTWITA) at Evalita 2016. We compared two approaches: a traditional HMM trigram Pos tagger and a Deep Learning PoS tagger using both character-level and word-level embeddings. The character-level embeddings performed better proving that they can provide a finer representation of words that a...
متن کاملEntity Linking for Tweets
We study the task of entity linking for tweets, which tries to associate each mention in a tweet with a knowledge base entry. Two main challenges of this task are the dearth of information in a single tweet and the rich entity mention variations. To address these challenges, we propose a collective inference method that simultaneously resolves a set of mentions. Particularly, our model integrat...
متن کاملNamed Entity Recognition using an HMM-based Chunk Tagger
This paper proposes a Hidden Markov Model (HMM) and an HMM-based chunk tagger, from which a named entity (NE) recognition (NER) system is built to recognize and classify names, times and numerical quantities. Through the HMM, our system is able to apply and integrate four types of internal and external evidences: 1) simple deterministic internal feature of the words, such as capitalization and ...
متن کاملFast and Robust POS tagger for Arabic Tweets Using Agreement-based Bootstrapping
Part-of-Speech (POS) tagging is a key step in many NLP algorithms. However, tweets are difficult to POS tag because they are short, are not always written maintaining formal grammar and proper spelling, and abbreviations are often used to overcome their restricted lengths. Arabic tweets also show a further range of linguistic phenomena such as usage of different dialects, romanised Arabic and b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: TELKOMNIKA (Telecommunication Computing Electronics and Control)
سال: 2016
ISSN: 2302-9293,1693-6930
DOI: 10.12928/telkomnika.v14i4.3876